feat: add ContextCompressor for context overflow handling#885
feat: add ContextCompressor for context overflow handling#885alcholiclg merged 5 commits intomodelscope:mainfrom
Conversation
Add a new context compression mechanism inspired by opencode's compaction approach. Features include: - Token-based overflow detection - Tool output pruning to reduce context size - LLM-based conversation summarization 🤖 Generated with [Qoder][https://qoder.com]
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust context compression mechanism for agents, designed to prevent context window overflow during prolonged conversations. It achieves this by intelligently detecting token limits, pruning less critical tool outputs, and leveraging large language models to summarize conversation history, thereby maintaining relevant context without exceeding token constraints. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a ContextCompressor to manage context window overflow by pruning tool outputs and summarizing conversations. The implementation is well-structured and follows the described strategy. My review includes a few suggestions for improvement, mainly around making error handling more specific by avoiding broad exception catches, replacing a magic number with a constant for better maintainability, and simplifying a redundant conditional check.
| except Exception as e: | ||
| logger.warning(f'Failed to init LLM for summary: {e}') |
There was a problem hiding this comment.
Catching a generic Exception is generally discouraged as it can mask unexpected errors during LLM initialization and make debugging difficult. It's better to catch more specific exceptions that you anticipate LLM.from_config() might raise, such as configuration or import errors. This makes the error handling more robust and intentional.
| content = msg.content if isinstance(msg.content, str) else str( | ||
| msg.content) | ||
| if content: | ||
| conv_parts.append(f'{role}: {content[:2000]}') |
There was a problem hiding this comment.
The value 2000 for truncating message content is a "magic number". To improve readability and maintainability, it should be defined as a named constant at the module level (e.g., SUMMARY_CONTENT_TRUNCATE_LIMIT = 2000) or as a configurable class attribute initialized in __init__. This makes the code's intent clearer and simplifies future modifications to this limit.
| except Exception as e: | ||
| logger.error(f'Summary generation failed: {e}') |
There was a problem hiding this comment.
This try...except Exception block is too broad. It can hide various issues from the LLM generation call, such as network errors, API key problems, or rate limits. Please consider catching more specific exceptions if the LLM client library provides them, and handle them accordingly. This will make your error handling more precise and robust.
| if last_user.content and last_user.content != result[-1].content: | ||
| result.append(last_user) |
There was a problem hiding this comment.
The condition last_user.content != result[-1].content is almost always true because result[-1].content is a formatted summary string, making this check redundant. If the intent is to append the last user message if it has content, the check can be simplified.
| if last_user.content and last_user.content != result[-1].content: | |
| result.append(last_user) | |
| if last_user.content: | |
| result.append(last_user) |
| msg = Message( | ||
| role=msg.role, | ||
| content='[Output truncated to save context]', | ||
| tool_call_id=msg.tool_call_id, |
There was a problem hiding this comment.
只tool_call_id就行吗?新 Message 时不保留 tool_calls 等字段会不会影响下游处理
There was a problem hiding this comment.
This path trims tool results (role='tool' content only).
| return 0 | ||
| return len(text) // 4 | ||
|
|
||
| def estimate_message_tokens(self, msg: Message) -> int: |
There was a problem hiding this comment.
没有直接使用Message类里面的prompt token、completion_tokens之类信息是因为?
| usable = self.context_limit - self.reserved_buffer | ||
| return total >= usable | ||
|
|
||
| def prune_tool_outputs(self, messages: List[Message]) -> List[Message]: |
There was a problem hiding this comment.
这个函数的prune操作过程感觉统一一下风格会不会更好?比如按照LLMAgent内的风格统一改成in-place操作修改msg.content=xxx。
现在对无需裁剪的部分是复用旧对象,对于需要裁剪的部分是创建新的对象,返回的list里面混合了两种类型的对象,后续修改的作用范围不是特别清晰。
Add a new context compression mechanism inspired by opencode's compaction approach. Features include:
Change Summary
Related issue number
Checklist
pre-commit installandpre-commit run --all-filesbefore git commit, and passed lint check.